DOCUMENT RESUME 



ED 244 980 TM 840 289 



AUTHOR Gipps , .Caroline; .Goldstein , Harvey 

TITLE Local and National Testing in the UK: The Last Ten 

Years . 

PUB DATE Apr 84 

NOTE 24p A ; Paper presented at the Annual Meeting of the 

American Educational Research Association (68th, New 
Orleans, LA Z April 23-27, 1984) • 

PUB TYPE Speeches/Conference Papers (150) Reports - 

Descriptive (141) 

EDRS PRICE MF01/PC01 Plus Postage. 

DESCRIPTORS Academic Standards; College Entrance Examinations; 

Educational. Assessment ; *Educat ional Testing; 
*Educat ibrial Trends ; Foreign. Countries ; *Handicap 
I dent if i cat ion ; I ri struct ion a 1 _ Improvement ; *Nat ional 
Programs; *Schobl Districts;_Screeriing Tests; Testing 
Problems; Testing Programs ;__Test Interpretation 

IDENTIFIERS ^Assessment of Performance Unit (United Kingdom); 

*Un i ted Ki ngdoin 



ABSTRACT 

New developments in testing in the United kingdom 
(UK) since 1965 are described I . Standardized testing a t the local 
level declined dramatically with the widespread introduction of 
comprehensive secondary education . However , in the late 1970 1 s 
widespread local testing programs were re-introduced for the purposes 
of monitor ing student progress^ screening students to identify those 
in need of special help, or providing information for transfer from 
junior to senior school. A hat ional testing program, the Assessment 
of Performance Unit (APU), was established in 1974* It is designed to 
assess achievement in language , math, science, and modern. language . 
The emphasis of the APU has shifted away from its original purpose of 
providing information relevant to policy making and resource 
allocation toward providing detailed information to guide teaching 
practice. Iri_the UK, there are also two types of public examination: 
the General Certificate of Education Ordinary Level (at age 16) and 
Advanced Level (at 18), and the Certificate of Secondary Education 
(at 16) for the less academic student. These examinations are set by 
various examination boards, and with such a diverse system, there are 
questions over comparability and confusion over whether the grades 
awarded are norm-referenced or criterion-referenced . (BW) 



********************************** 

* Reproductions supplied by EBBS are the best that can be made * 

* from the original document. * 

*********************************************************************** 



ERLC 



CO 
LU 



LOCAL, AND NATIONAL TESTING IN THE UK: 



U.8. DEPARTMENT Of EDUCATION 

NATIONAL INSTITUTE OF EDUCATION 
EDUCATIONAL RESOURCES INFORMATION 

CENTER (ERIC) 
lathis document has been reproduced as 
rocoivod from tho person or organization 
originating it. 
[ j Minor changes have boon made to improve 
reproduction quality. 



Points of view or opinions stated idjl lis docu- 
ment do not necessarily represent official NIE 
position or policy. 



THE LAST TEN YEARS 



Carol ine Gipps and Harvey Goldstein 
University of London Institute of Education 
London U.K. 



"PERMISSION TO REPRODUCE THIS 
MATERIAL HAS BEEN GRANTED BY 



TO THE EDUCATIONAL RESOURCES 
INFORMATION CENTER (ERIC)." 



ERIC 



Paper presented to the American Educational Research Association 
Annual Meeting, New Orleans, April 198^. 



Author Address:- Br Caroline Gipps 5 Professor Harvey Goldstein 
Institute of Education 
18 Woburn Square 
London WClH 0N3 
United Kingdom 



i I M„,i national. Testis in theJJKj The L a st Ten Years 
Carol ine Gi.pps arid Harv ey Goldste in_ 

: pi;-. ; i'c:-:. ; r.r Programmes 

r,._ the period after IO65 standardised testing at local level in this 
country declined dramatically to Li owing Government Circular IO/65 , the 
document which heralded the widespread introduction of comprehensive 
.secondary education. Previously the ii+ test, a series of IQ, English aiUi 
:q :U .hs tests, had been used to allocate children to academic (grammar) or 
non-academic (secondary modern) secondary schools. As the process of 
compiehensiviaatibn proceeded the need for selection declined. However, 
some school districts (local education authorities - LEAsj continued with 
attainment and ability testing in order to or sure a 'balanced' intake into 
secondary schools, for example the Inner London Education Authority. 
Other LEAS never entirely abandoned selection, retaining one or two grammar 
schools for which entry was determined through testing and a handful 
retained a completely selective system. Thus group testing did not disappear 
altogether but in the mid to late 70s several events, social, political 
and educational, ied to an increase" in concern over education which 
culmitiat'-d in the introduction of widespread testing programmes 
around 1978; The significant events were - local authority reorganisation 
in Vy?U; the Bullock Report in 1975, the 'Black Papers' in 1975 and 1977, 
the Prime Minister 's Ruskin College speech in 1976, and the William Tyndale 
School report in ".976. We shall deal With each of these events in turn. 

Following local authority reorganisation new school districts were 
created and as a result, administrators and professionals in areas which 
had taken on new schools felt that they did not have a grasp of the levels 



ERIC 



of attainment of the new children and schools in their area: This was 
compounded l>.y the iact that with the ending of the ii+ there war, no 
information on levels of performance of children leaving primary 
(elementary school from 1st through 7th grade) school; The reduction of 
11. t;e:;ti:ng had, in fact, had considerable impact at the primary school 
love'. Freed from the constraints of this leaving exam, there was a 
revo 1 . u ; ion in the primary school curriculum with child-centred approaches, 

jovcy learning and an emphasis on individual or small group teaching. 
The reduced emphasis on traditional methods of teaching the basic skills 
had its. ciitics and the Black Papers edited by Dr Rhodes Boy son (now 
.'•eeretary of State for Social Services, previously a minister of Education) 
argued that modern methods in fae primary school, and non-selective 
secondary education, were resulting in a lowering of 'standards'. The 
Bullock Committee on the teaching of English was set up because of concern 
over standards of literacy, and one of the Committee's recommendations was 
that BEAs should monitor reading levels regularly through the use of 
standardised group reading tests. This report, produced by a committee of 
highly regarded professionals, became one of the most influential documents 
of the 70s and effectively it set the seal of approval on testing by LEAs. 
This was .just as well, f or with the increased calls for information on 
standards by politicians at this time came an event which was known as the 
'William Tyndale affair'. A primary school in London which ran a progressive 
regime became the focus of concern for some of the parents and staff. 
IV. ere was an enquiry into the running of the school and the Head and some 
of its s.aff were suspended from duty; this was an unheard-of happening and 
the shock waves ran through the education system. "Could it happen here?" 
asked many a Director of Education (Superintendent of Schools) and if they 
did not know what standards in the basic skills were in their primary 
schools they set about designing monitoring systems. The simplest monitoring 
scheme of course is a group testing programme, and as the Bullock Committee 

4 



}\:V<\ advocated regular moni toring of reading it was hot too 
H.-u-d to this idea across to schools. Then in the Prime 

Minister James Caftlaghan made a speech at Ruskin College in Oxford in 
which he questioned the right of educationists to determine the direction 
in which education was going without reference to other interested parties; 
This i -/ar. the start of the accountability debate in this country which was 
related to issues of value for money in the new era of public expenditure 
constraints: "We spend £5 billion a year on education so there will be 
d i rxuir^ion" (TES P2.10.75) said Prime Minister Gailaghan. 

riot ail these precipitating factors had the same impact in different 
iiEAs. Some LEAs cite political factors for setting up testing programmes - 
the atmosphere in the mid to late 70s at the time of the Ruskin College 
speech, the Black Papers and the William Tyndale affair resulting in 
pressure from local politicians; others cite organi sational factors - the 
ending of the 11+, secondary school reorganisation and TEA reorganisation 
all leading to a demand for information particularly relating to primary/ 
secondary transfer; yet others cite professional factors - concern over the 
number of children referred for remedial help, both too large and too small, 
and concern over reading standards following publication of the Bullock 
Report. But whatever the initiating factors, by 1981 almost 80% of 
all LEAs had a testing programme* (79 out of 10*0, with 1978 being the year 
when .nost new testing programmes were introduced (Gipps et al , 19835- 
Briefly, the situation with regard to who is tested on v/hat is similar to 
the USA (Wigdor and Garner, 1982) with most testing taking place at junior 
( l \th through 7th grades) levels most tests covering the basic skills in 
reading, maths and language, and norm-referenced tests being more popular 
t h an cri terioh-rel erericed tests. 

* defined as .. '"any tt.-ting of children in an age grov.p r: : \ch is organised 
arid promoted by the LEA as a matter of policy". 




ERIC 



Kegardless of reasons for introduction-, most testing programmes are 
,,„.,, ., variety of purposes: monitoring - that is of overall standards 
within an are* and/or of individual schools; screening - that is to identify 
,.„;.. j-en who are in need of special help or provision; and providing 
information for transfer from .junior to senior school, are the most widely 
given reasons for testing, and not individually either. We were struck by 
the rmge of reasons given for testing: LEAs seem to believe that testing, 
an. nearly always a single test, can satisfy several purposes. We received 
comments such as "The tests are administered to monitor performance across 
the authority, to indicate resource requirements and to enable decisions to 
be made on appropriate curriculum". Ail this on the basis of scores on a 
idmple non-diagnostic reading test. Our findings indicated a lack of clear 
thinking in LEAs as to why they had their testing programmes, which is a 
disquieting fact m itself but also there are technical limitations to the 
efficiency wUh which the same test can be used simultaneously to monitor 
and to screen . Fresh thinking about testing has perhaps been hard for LEAs 
because the testing of reading has for so long been part and parcel of 
schooling in the UK. What seems to have happened in some authorities is 
that programmes which set out originally with screening as their main 
purpose have, as often as not, had monitoring added, perhaps as a political 
response and then, with the threat of cuts, had allocation of resources 
added as well. The implementation of the 198l Education Act in 1983 
which obliges LEAS to ensure that all children with special educational 
needs are catered for adequately, preferably within mainstream schools, 
emphasises the identification and assessment of children with special 
educational needs, so we may see an expansion of testing for screening 
purpose:-:. 

Last year (1985) we carried but a survey of screening programmes in 
LEAs and found that around 705^ (72 out of l6k LEAs) did indeed use 

6 



jiUuiriarrli tests to Identify children with special educational heeds 
(and ' i t'uH.lior three LEAs are planning; to introduce test for screening/ 
i dent i t'icatibri purposes). Thesis screening programmes are little different 
in owl- ward appearance from the accdUritabili ty testing prdgranlrnes we were 
Lb Id about in L c )8l and we have yet to find out whether the results are 
used in any more rigorous a fashion. We iound then that results of 
monitor in/ 1 ; programmes were hot used to make schools or teachers accountable 
i.h any Hard-line way- Ther e was rid LEA which published 'league-tables • , 
i.e. named school results (though one LEA attempted to do this recently, 
TfcK r.U'.tfJ). The chief reason for this was that teachers, individually, 
dii lasting panels convened by LEAs, and through their Unions made it 
abundantly clear that league tables were not acceptable, and LEA officers 
i.n turn did much to persuade Education Committee members (i.e. representatives 
of the community arid local politicians ) not to ask for league tables. At 
the school Level we found, as did Leslie Salmon-Cox and colleagues in 
Pittsburg (I98I) and Kellaghan and colleagues in Ireland (1982), that 
teachers made little d" ^ect use of test scores themselves. Scores were put 
into record books largely for the benefit of someone else, though of 
course if the scores gave cause for concern teachers would act on them, but 
by and large the feeling was that they were of use mostly to someone else - 
the Head or the LEA. The need of both heads and local authorities to have 
testing in order to keep a check on 'standards 1 (whatever that means) was 
Well accepted in the school system. In that situation the perceived heed 
in for norm-referenced tests for the comparisons that they make possible. 
One is tempted to say that, given no-one expects results from this type of 
testing to be of much use except for level-checking and comparison purposes, 
the need is for tests which are as quick, simple and straightforward as 
possible with perhaps less concern for maintaining reliability and validity. 
We are not of course here talking about individual diagnostic testing such 
as that carried cut by educational psychologists or special education staff. 



The mood now at local level has moved away from concerns with monitoring 
Mini acoounUibU i ty to - in the face of expenditure cuts and falling rolls, 
resource allocation (though it is far from clear how testing information 
can help in this area ) and how into the special educational needs area. 

Special Educal i.onal Needs 



Our own interests have also turned towards children with special educational 
needs and methods of identification of sul h children. The official 
definition is of little help here*: 'a child has special educational needs 
if he (sic) has a learning difficulty which calls for special educational 
provision to be made for him . and 'a child has a "learning difficulty" 
if --- he has, a significantly greater difficulty in learning than the 
majority of children of his age' (DES 198l). 

As we have already said, many LEAs have fallen back on the good old 
standardised reading test as a first line of attack in their attempts to 
fulfil their now wide obligations to provide adequate support for all 
children with special educational needs. With the norm-referenced overtones 
of the official definition this may be appropriate but there is also a move 
towards using skills or curriculum-based assessments with precisely stated 
objectives, on the basis that clear objectives combined with feedback on 
progress are a necessary prerequisite for effective teaching (Cameron, 1982). 
This movement stems partly from the objectives approach of much special 
education teaching, partly from a genuine desire to develop assessment 
techniques which provide some feedback for the teacher and which she/he 
car. actually use, and partly from changing models of provision for children 

It is an interesting aside here that. Sir Cyril Burt , that much denigrated 
English psychologist, also had trouble with vague definitions as far hack 
as"l9?l. In those days the statutory definition was 'incapable of receiving 
proper benefit from instruction in the ordinary public elementary schools' 
fBurt 1Q?1 pl6?h See Gipps & Goldstein (198*0 for expansion of this theme. 



with special educational needs. The current wisdom is that, in the ['nee 
of douhb; about the effectiveness of remedial teaching based on withdrawal 
sessions by peripatetic staff, children with learning dif f icuitier. arc 
bent helped by their regul ar teacher in the context of their own classroom • 
The catch-phrase is 'All teachers are teachers of children with special 
needs'. In this situation there is a need to provide the class teacher 
with assessment materials that are curriculum-based and therefore help the 
teach or to design and implement teaching programmes matched to each child's 
.special needs ( Ainscow £ Muncey, 1983)- This seems a potentially 
interesting development in what we might call 'useful 1 testing (as opposed 
l.o use-f or-othere testing which ve described earlier) and our current 
research is involved in investigating such new developments. 

These developments are in line with the recommendations about assessment 
in the l c ) c ~ ?- WAS panel report on Mild Mental Retardation classification/ 
placement . 

"The fundamental assessment principal emphasized repeatedly 
was educational utility. Information related to educational, 
decision making, especially that which leads to more effective 
educational programming, was seen as worthwhile, beneficial ... 
Messick's well-placed emphasis on assessing the regular education 
program before or concurrent with initial referral as wel] as 
development of interventions in regular education as a first step 
is in line with current legal, legislative and professional 
opinion. Moreover, fiscal realities, ir addition to perceptions 
of children's best interests, dictate greater Use of interventions 
v/ithin regular education instead of referring all (or even most) 
problems to very expensive special education programs- 11 

(Reschly, I983) 



Many of the Issues raised by Lorrie Shep..ard in her NONE presidential 
,,, ulrt ,, ; , a:i |. year are relevant here too; for example, the technical 
inadequacy of testa used in assessment,- the professionals' poor awareness 
or th e difference between an adequate and inadequate test, traditional 
);e; , t choice preferences in the face of evidence of inadequacy (see 
particularly SteadiHah £ Gipps, 19^0 and the widely felt need for norms 
(Shephard, 1985). 

Na t i onal Mo ni t o r i n g 

In Inland the Assessment of Performance Unit (APU) parallels the 
American National Assessment of Educational Progress. The APU was set up 
, y t ; ie Department of Education and Science (DES-) in 197'. but there had 
been a considerable gestation period and its appearance was the result of 
ma ny of the same concerns which caused LEAs to set up their own monitoring 
systems. 

From 19^8 to 1964 the DES had commissioned regular national reading 
surveys to be carried out by the National Foundation for Educational 
Research. These showed that during the l6 years of the surveys there had 
been advances of several months in the reading ages of 11 and 15 year olds. 
The next survey was not conducted until 1970 and, unlike the previous 
surveys, it did not show an advance in average reading ages. That survey 
was bedevilled with problems which resulted in a sample which was 
probably unrepresentative, and the test Which had been used in previous 
surveys was by then out of date. Nonetheless, these results caused a 
furore in the world of education and beyond, and critics of progressive 
education took it as evidence of a deteriorating system of state schooling 
One of the consequences was the setting up of the Bullock Committee, to 
which we have already referred, and the report's first recommendation was 

10 



for a system of national monitoring employing new instruments. Thin was 
I lii* I'ii'.'.t; publ ic; indication that central government was interested in 
national monitoring ^ but in fact discussions had been going on behind 
tile ::icehes for come time. As early as 1968 ah internal DES paper 
::uj^eritecl a wide ranging; tostirig programme as one means of assessing the 
results of educational investment ^ arid maths was singled out as an area 
In which to start (DES, 1971 ) - As we have seen, these discussions took 
place against & backdrop of increasing concern over standards. In the 
face of this the DES 1 lack of control over what went on in schools, in 
-pi te of the fact that the DES funded schools and was ultimately accountable 
for what went on in them, caused increasing concern amongst some officials. 
A national scheme of monitoring would provide the DES with some means of 
evaluating the performance of the education system directly (so the 
reasoning went ) and hence possibly with an indirect say in curriculum 
content. It might also provide longed-for evidence to dispute the claims 
of those who argued that standards were falling (Gipps & Goldstein, 1983). 

However , the APU was actually announced in the context of government 
moves to deal with educational disadvantage and the educational needs of 
immigrants (DES, 197*+) and the APU 's role was to help to develop criteria 
to identify educational disadvantage. This announcement caused few ripples 
at the time since the move to deal with disadvantage and under achievement 
was welcomed by educationists. The early publicity materia]- put but by 
the APU, however, had a different tale tq tell: the APU 's role was to 
monitor in order to provide information on standards and how these change 
over time* The educational climate in the mid 70s was, as we have seen, 
one in which the prof essioneds were being criticised, at least indirectly. 
In this climate the APU became the focus of wider attention arid suspicion 
on behalf of many of those in education, linked as it was inevitably with 
the move towards greater accountability; Proposals to monitor standards 




nationally were perceived to emanate from the political right arid were 
threaten i tig to i;ite teaching profession; 

There were two main areas, of concern; The first was that, though ■ 
ostensibly concerned with children's standards, the APU was really dealing 
with teachers' competence. The second Concern was its possible effect on 
the curriculum, through the curriculum models adopted by the testers, the 
increased importance of the areas tested (and by corollary the decreased 
importance of areas not_ tested), and teaching to the test. 

Before looking at what came of these concerns, a brief look at what 
the APU is and how it works. The APU is in fact a small unit within the 
DES. It oversees the surveying of performance, which is actually contracted 
out to the NPER, Leeds University and Chelsea College (University of London) 
Each of the test development teams has an advisory group, there is a 
Statistical Advisory Group which advises the Unit on technical matters, 
and a Consultative Committee which is largely made up of no n- DES people 
and is representative of outside interests. This latter committee makes 
suggestions about policy matters and has been extremely influential. 

The APU does not test in as many areas as the NAEP; it covers language, 
maths, science and modern language, essentially a core curriculum, but it 
may come to include design and technology. It tests only at three ages 
11, 13 and 15 (and not at all ages in ail subjects). Initially maths, 
language and science monitoring was carried out annually for five years 
and modern language for three years. This initial cycle of five annual 
surveys ended in 1982 for maths, 1983 for language and 198'* for science; 
the last of the three annual surveys in modern language will take place 
in 198-5. Maths, language and science will then be monitored every five 
years; a decision about the future of modern language monitoring has yet 



12 



to be made; This rolling system of monitoring with maths, language arid 
Science taking it in turns will have the function of updating the national 
picture and identifying trends, while limiting the burden on schools and 
reducing costs; 

The take-up of published reports, particularly by teachers, has been 
pool* Mild this was a major theme or our ±9&2 evaluation (Gipps & Goldstein, 
1 >r>3, op cit). In November ±9o2 the decision was made by the DES that 
more emphasis be put on dissemination. ffius there is now a series of 
occnr, Lonai papers and a regular newsletter, rather in the style of the 
'IAEP newsletter; The DES has produced a booklet on the writing performance 

of 15 year olds and the Association for Science Education, acting as the 

APU's agent, is publishing a series of pamphlets on science performance 
aimed at the classroom teacher. 

Instead of publishing major reports at considerable expense, the emphasis 
ir, now on short, easy to read booklets on specific areas aimed at a 
specific audience. The APU has also commissioned independent evaluations 
of the maths and language reports following the NAEP model* it is expected 
th.*it these will result in various documents for in-service training; 

There has been continuing discussion within the Unit, its committees, 
rjroups and teams over the nature and extent of the background variables 
which should be measured. Information of this sort is essential for the 
1 n terpretation of findings and to provide data of value to policy makers, 
which is part of its task, that is, to identify differences in achievement 
Lh relation to the circumstances in which children learn. The Statistics 
Ad vi. sory Group has advised against the collection of several proposed 
variables because of problems of measurement, while the Consultative 
Committee has been consistently against the collection of home background 
information from either parents or children. The current situation is that 
school -based background measures are being collected by the teams in their 




surveys. However, composite measures of background (social arid educations 
have limited potential for explaining performance at an individual level. 
And, as Nuttall (1985) has pointed but, there can be little doubt that 
in any case information on classroom processes and detailed curriculum 
in formation is vital for interpretation of survey results. Such data is 
not easy to obtain from large scale surveys but requires more intensive 
in-depth studies. In the fallow four year period between surveys, the 
team:"; will now have an opportunity to make in-d^pth studies, which were 
promised when the Work Was first commissioned. At this point, however, 
only in -depth analysis of existing data is involved; although in-depth 
:;tud i er; involving the collection of new, data are possible this option has 
hot yet been taken up by the test development teams. 

This problem is riot restricted to the UK national assessment programme 
A comparison of the American , British and Australian monitoring programmes 
by Power and Wood (1984, in press) concluded: 

"There is no Way in which a national assessment program, of the type 
developed could serve a social accountability function, given the 
structure and politics of education in Australia, the UK and the US. 
Ar, Well, in developing the programs political considerations proved 
more important than clarification of objectives and of what would 
be needed if these were to be met. As a consequence, the programs 
developed into bland monitoring exercises of little direct 
information value to policy makers and educators. 

... As ali three evaluations suggest, the picture would be clearer 
and more readily interpretable if additional student and school 
background and process data were collected arid further research on 
the instruments and follow-up studies were undertaken . . . !l 



14 



Thfl other area in which the APU (arid NAEP) has met problems Us in 
analysing and reporting changes in performance over time: There its no 
consensus on how to analyse trends over time arid this relates directly 
to the i::;r.ne of what one can nay about standards (in terms of whether 
they are rising or falling which is what most people want to know). 
As Nut tall (op bit) says "Finally, the measurement of change of over 
time: the only possible conclusion is that a satisfactory long-term 
method has not been devised". NAEP, for example, has relied on using a 
number of items that are common from one survey to the next to indicate 
change, although ETS seems to be proposing to resurrect latent trait 
models Cor this purpose - a proposal contemplated but now rejected by 
the APU. The main problem with using a common core of items is that this 
method cannot provide a wholly representative sample of the items used in 
any particular survey and so the information thus provided on changes in 
performance over time is inevitably limited. The APU teams are also using 
some common items from one survey to arother , for example, in maths half 
the items were common in the first and last annual surveys. At the end of 
the five year period of surveying, each team will produce composite 
measures of performance over the five years which will serve as a baseline 
(or standard) with which to compare performance measured subsequently in 
the five-yearly surveys. By then, che question of how to analyse trends 
in performance may have been answered in part. Certainly the Unit , 
although it said much about standards in the early days, has not attempted 
to define 'standards' in the sense of acceptable -or looked- for performance 
arid wiiL instead rely on describing measured performance over a period of 
several years, a far less contentious and more acceptable task, arid on 
comparing relative changes between groups, e.g. sexes^over time 
(Goldstein, 1983). The DES however, is not quite so circumspect: the 
pamphlet on the writing performance of 15 year olds was launched as a 



15 



ERIC 



contribution to the debate on standards "to trigger a public debate 
abou , i.i„i content Of English teaching and the standards needed " (TES, 

By adhering to the principle of light sampling; anonymity of students 
and schools, and the inclusion of teacher union representatives on its 
Consultative Committee, the APU has gone a long way towards allaying 
teachers' early fears. The extent to which the APU has carried the 
teachers with it can be illustrated by some findings of a teacher-interview 
survey we carried out in late 1982: approximately 70% of the primary and 
secondary heads interviewed (120) were in favour of national monitoring 
(Gipps et ali 1983, op cit) with accountability and the need to keep a 
check on standards to the fore in their comments. 

The other early concern was about its impact on the curriculum, 
specifically its role in introducing a core curriculum and the 
curriculum backwash effect of the test items used. The APU's sampling and 
testing policies have prevented the curriculum backwash which would result 
from teaching to the test. The impact of the APU on the development of a 
core curriculum, however, cannot easily be separated from the influence of 
other factors in education. In 1982, when we wrote our evaluation of the 
APU, we felt that any impact there might be on the curriculum would be via 
the curriculum models adopted by the test development teams; the teams 
were aware of this and operated on a wide curriculum model so that any 
impact would be widening and not narrowing, and positive not negative. 
Indeed in 1982 there was a certain ambivalence on the part of the APU 
towards its role vis a vis the curriculum. The APU had been accused of 
being a Trojan horse to bring in an assessment-led curriculum; this 
however was a slightly paranoid view of the role of central government in 
tlie education system without sufficient awareness of the constraints on 

16 



it: through the countervailing power of bodies such as the National Union 
of Teachers; But the Unit, in order to allay fears, maintained that it 
would not attempt to influence the curriculum via backdoor methods; That 
:u?ib i v:i\ once about Lts current role has now gone and one of the Unit's 
current major aimi'> is to milk its very detailed survey findings in order 
to improve curriculum content and delivery , that is, teaching- it hopes 
to achieve this via its new dissemination policy and by running in-service 
courses for teachers arid LEA subject advisers. 

Indeed, the whole curriculum scene has changed over the last two 
years, since the Schools Council, the teachers' body responsible for 
development of the ' curriculum and examinations, has been disbanded and two 
new organisations have been set up - the School Curriculum Development 
Committee and the Secondary Examinations Council - with more DES control. 
Though tl:t-re are no formal links between the APU and these two organisations 
the APU data will be fed into their committees to help them in their early 
deliberations. Two particular areas of input are likely to be in helping 
tc think about criteria for allocating grades in the new 16+ exams and in 
suggesting modes for examining. Of course, now the DES has the SCDC, it 
no longer needs the APU as a means of having some say in the curriculum. 



Within the Unit the emphasis now seems to have shifted away from a 
concern with information relevant to policy making and resource allocation. 
Instead it is in providing detailed information to guide teaching practice 
that; the APU's profile seems to be highest. The incidence of low achieve- 
ment, changes over time, policy decisions concerning resource allocation , 
making test items available to LEAs these are all still on the agenda 
but one senses that they are no longer considered to be paramount. These 
areas are of course potentially far more problematic, particularly given 
the way the AF>U carried out its tasks prior to 198?. 




Current APU moves to disseminate its findings to improve the 
curriculum - by, it must be admitted, anything but backdoor methods - can 
be given a cautious welcome (and certainly the demand from LEAS and teachers 
for courses and conferences seems quite considerable). However, its 
future impact on the curriculum is uncertain and much will depend on the 
APU's links with the aforementioned new organisations - the SCDC and the 
SEC - and how these attempt to shape the curriculum.* 

New Developments in Publi e/School-1 caving Examinations 

So far we have not mentioned the area of public examinations - those 
which students take at l6 and 18. There are two types of exam, the 
General Certificate of Education (CCS) Ordinary Level (at l6 ) and 
Advanced Level (at l8) which are meant for the top 20% of the population. 
For the less academic student there is the Certificate of Secondary 
Education (CSE) taken at l6 only. These exams are set by various 
examinations boards, independent bodies under the aegis of the Universities, 
except for one type of CSE exam 'Mode III' which is set by the student's 
own school but has to be approved by the relevant examinations board. 
With such a diverse system, there are bound to be questions over 
comparability and there is confusion over whether the grades awarded are 
norm-referenced or criterion based. In fact they are largely norm-based, 
i.e. the top x% always get Grade A, although some variation is allowed 



Wood and Power (forthcoming) make the point that indeed, given the human 

„d financial resources the APU has received, it is not surprising the APU 
has been able to produce superior test materials. Now that big curriculum 
reform projects have gone out of favour i n the UK, the APU can be viewed as 
the "nearest thing to a curriculum reform project". 



is 



i'rbrn year to year so as not to penalise an apparently unusually highly- 
-coring r;roup, and there have been distinct trends over the yearn in some 
nub jects. 



Although the top grade in CSE has the same value as a pass grade in 
GCE f 0' level, the latter has becone the qualification looked for by 
employer.:, and the CSE has as a result become devalued. There has in fact 
been considerable dissatisfaction with this dual system of examining and 
in l v /6 the Schools Council submitted proposals to develop a single 
ex-uni nation at l6 » . In 19^3 the Government agreed to introduce such a 
singLe system subject to the creation of satisfactory national criteria 
!or sy I. Lab uses , assessment procedures and the award of grades. ft major 
exercise to develop such criteria is underway on the part of the exam 
boards, the SEC and other national bodies (Orr and Nuttall ,1983). 

The present Secretary of State for Education has brought in the 
riot ion of grade-re I at ed criteria: ''national criteria must be established 

to ensure that . . . all boards apply the same performance standards to 

the award of grades" (Orr aad Nuttall, op cit). This development, ahich 
lias not yet been completed is part of a more general trend to move away 
from purely norm-referenced testing towards criterion-referenced testing - 
which attempts to specify more precisely what a student can actually do. 
The attraction of criterion-referenced testing is that it can have a 
positive value for all students, since it is a record of what can actually 
he done. Nevertheless ; the practical pressures to aggregate a large number 
of criterion-referenced assessments for purposes of selection and so oh, 
is Likely to leave us with many problems - not the least of which is the 
requirement for comparability. Indeed, the distinction between norm- 
referenced and criterion-referenced testing in widely misunderstood in 




ERIC 



t h3 j|K (.see Black et al , 1984) and it seems likely that the present high 
level political advocacy of criterion-referenced tenting and its 
acceptance by much of the teaching profession is based on a 
mi :uuidcn : staiidi hg of its nature and pStentialities (Goldstein, 198'i). 

One particular type of criterion-referenced test has existed in this 
country for many years - the graded test. The most well known example of 
graded testing is that of music, though there are now moves underway to 
develop graded tests in other areas, for example, foreign languages 
(which has actually been going on for some time) and English. It is 
Likely, however, that subjects like maths lend themselves more readily to 
graded testing lit pre-rpeci f ied criteria) than subjects like English 
<; Nut tali and Goldstein, forthcoming). 

Another approach is that of profiling in which an individual's results 
in a subjecc are reported in the form of a profile which specifies levels 
of attainment in each/range of skills (see Mortimore and Mortimore, 1984, 
for a review of profiling anrt graded tests). However, this approach is 
not without its measurement problems either; as Nut tall and Goldstein (op 
cit) point out, one of the more serious of these is how to deal with 
aggregating very detailed assessments of individual attributes. At 
another level, Her Majesty's Inspectors are concerned that schools will 
seize on profiles and use them without careful planning, not to mention 
deal ing with the issue of comparability between schools (Education ?9.?.83J 

There is also the danger that profiles and graded tests will, be used 
only for the bottom k0% of the ability range. Indeed, at the end of 
Vi>;; the Government made available £% million for the development of 
graded tests in maths for lower attaining pupils (DBS Press Notice ?G8/8?) 
One is here driven to question the motives of a Government which is 

20 



encouraging the development of a untried examination system on the one 
ha: ,,l ; ltt( j development of Leaded tents for a particular section of the 
population on the other. 

developments 

One thing all these n-aw / have in common is the desire to move 
toward:- an examining system which tevls us something about what students 
can do in specific terms. This parallels the move in special/remedial 
oiiiicr.iL: on towards curriculum- and teaching objectives-based assessment. 
The underlying requirement is that test scores and exam results should 
c-ivry more information with Lhem than they do at present. It would be by 
no means a bad thing if these scores and results were to be more useful, 
arid there lore used more, than scores from norm-referenced standardised 
tests. One of the challenges to those concerned with educational 
measurement is in finding ways in which such sets of more detailed 
information can be conveyed informatively. 

In January of this year the Education Secretary made a major speech on 
future educational policy which received warm welcome from many in 
o<i\^at : on. This significant speech, known as the 'Sheffield speech 1 , 
emphasised the need tv raise standards and outlined the changes required 
Lh examining and the curriculum in order to achieve this rise. The 
Secretary of State gave as his objective bringing 30-90$ of all 16 year 
old pupils at least (his emphasis) up to the level how associated with 
that grade in CSE which is currently achieved by average pupils. He 
ri-; iterated his call for a greater degree of criterion-referencing in 
public exams, and for explicit definitions of the objectives of each 
phase, and of each subject area, of the curriculum. Explicitly defined 
curricular objectives increase teacher expectations, so his argument went, 
imd high expectations based on defined objectives motivate pupils to give 
of their best. And - echoes of the then Prime Minister ! s Ruskin College 

21 



speech In 1976 - "There would be a 'further gain if defined curricular 
objectives were not only broadly agreed by all the partners in the 
education service hat, were al s o shared by those who- use i-t and pay for 
i (. z parent.r,, employers, and the tax and ratepaying public" (our emphasis). 
Although the e-iphasi.v. on standards and value for money is much the same 
as it was in 1076 , some using Has changed; the current view about who has 
a right to be involved in the curriculum. "There is now no serious 
dispute that the school curriculum is a proper concern not only of the 
teachers, but also of parents, governing bodies, LEAs and the 
(Ibverriment .." (Education 15.1.84). 

Testing, and its inevitable companion the curriculum, has come a 
long way in the last ten years. 



22 

ERIC 



REFERENCES 



A1N;;<:0W M » MiifjCgY J (198;5) All Teachers are Teachers of Children with 
Special Needs Elm Bank Teachers 1 Centre, Coventry LEA. 

!•■{ ,ACK P, HART, EN W & ORGEE T (19"S^ ) Standards of Performance - Expectations 
and Reality API j Occasional Paper No London : DES. 

Bmrt C •;1 ; 1"1) Mental and Scholastic Tests London: P S King & Sons, 

CAMERON R J (198?) Teaching and . Evaluating Curriculum Objectives, 
Remedial Education 17 pp 102-108. 

DES Report of the Working Group on the Measure ment of Educational 

At tainment . 

DF;'l (l'V*0 Educat;ibnal_Disadv^tage^,and the Educational Needs of 
immigrants Cmnd 5720 London: HMSO. 

>DFi; (l'lol) Education Act 1981 London: HMSO. 

GIPPS C K ( 101 , OSTEIN H (1983) Monitoring Children: An Evaluation of the 
Assessment of Performance Unit London : Heinemann Educational Books. 

GIPPS C 5 GOLDSTEIN _H (1984) Twenty per cent with special needs: another 
legacy from Cyril Burt? (in press) 

GTPPS C, STEAD MAN S, BLACKSTONE T £ STIERER B (1983) Testing Children : 
Standardised- Testing in Local Education Authorities arid Schools ^ 
London: Heinemann Educational B 0 oks. 

GOLDSTEIN H (1 98 J>) Measuring Changes in Educational Attainment over Time : 
Problems and Possibilities, J Educational Measpremen-t , 20, k. 

GOLDSTEIN II (1984) Models for Equating Test Scores and for Studying the 
Comparability of Public Exams, in Assessing Educational Achievement, 
Nut tall , D. (Ed ) , Palmer Press (in press ) . ~ 

KE! J.AGHAN T, MADAUS G & A IRAS I AN P (1982) The Effects of Standa rdised 
Testing, Boston : Kluwer-Ni .jhof f Publishing 

M0RTTM03E J & MOPTIMORE P (I988) Secondary School Examinat ions :' the helpful 
servants, not the dominating mas±er_ f Bedford Way Paper No 1 8- 
London: University of London Institute of Education. 

NUTTALL D L (1983) Monitoring in North An.irica Westminster Studies in 
Education Vol 6, 1 983. 



NIiTTAI.L D L K GOLDSTEIN H (1984) Profiles and Graded Tests: the technical 

issues, in Profiles in Action , London: Further Education Unit (forthcoming 

ORR I, & NHTTALL D L (1983) Determining standards in the proposed singly 
system of examining at 16-f Comparability in Examinations Occasional 
Paper 2, London : Schools Council. 

POWER C & WOOD H (1984 ) National . Assessment : A Review of Programs in 
Australia, United Kingdom and United States, Comparative Edujqa±lan- 
Review (in press). 



23 



RESCHLY D J (1983) Comments on the National Academy of_ Sciences Report on 
Mild Mental Retardation Classification/Placement, paper presented to 
A ERA Annual Meeting, Montreal, 198j; 

SAT.MdN-CdX L (108l) Teachers and standardised . achievement tests: what's 
' really Happening?, Phi Delta Kappan, May 19&L. 

TrE^HARD L (1985^ The Role of Measurement in. Klucatibnal Policy: Wessons 
"'"from the Identification of Learning Difficulties, Educational 
Measurement: Issues .and Practice , Fall 19o3. 

oTEADKAN S 8c GIPPS C (198- ) Teachers and Testing: pluses and minuses, 
Educatio nal Research (in press). 

WIGDOR A Sc GARNER W (Eds) (1982) Ability te sting: Uses , consequences 
and controversies, Washington DC: National Academy Press. 

WOOD R & POWER C (forthcoming) National Assessments and 'standards': 
for better or worse? 



24 



